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Abstract 

Background: Atypical adenomatous hyperplasia (AAH) and squamous cell dysplasia (SCD) are associated with the 
development of malignant lesions in the lung. Accurate diagnosis of AAH and SCD could facilitate earlier clinical 
intervention and provide useful information for assessing lung cancer risk in human populations. Detection of AAH 
and SCD has been achieved by imaging and bronchoscopy clinically, but sensitivity and specificity remain less than 
satisfactory. We utilized the ability of the immune system to identify lesion specific proteins for detection of AAH 
and SCD. 

Methods: AAH and SCD tissue was surgically removed from six patients of Chinese descent (3 AAH and 3 SCD) 
with corresponding serum samples. Total RNA was extracted from the tissues and a cDNA library was generated 
and incorporated into a T7 bacteriophage vector. Following enrichment to remove "normal" reactive phages, a 
total of 200 AAH related and 200 SCD related phage clones were chosen for statistical classifier development 
and incorporation into a microarray. Microarray slides were tested with an independent double-blinded population 
consisting of 100 AAH subjects, 100 SCD subjects and 200 healthy control subjects. 

Results: Sensitivity of 82% and specificity of 70% were achieved in the detection of AAH using a combination of 9 
autoantibody biomarkers. Likewise, 86% sensitivity and 78% specificity were achieved in the detection of SCD using 
a combination of 13 SCD-associated markers. Sequencing analysis identified that most of these 22 autoantibody 
biomarkers had known malignant associations. 

Conclusions: Both diagnostic values showed promising sensitivity and specificity in detection of pre-neoplastic 
lung lesions. Hence, this technology could be a useful non-invasive tool to assess lung cancer risk in human 
populations. 
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Background 

Carcinoma of the lung is the leading cause of morbidity 
and mortality of human solid cancer worldwide. It 
accounted for around 12.7% of all new cancer incidences 
and 18.2% of all cancer mortality, or approximately 1.4 
million deaths worldwide in 2008 [1]. In populations 
with long-term cigarette use, the proportion of lung can- 
cer cases attributable to smoking approaches 90% [2]. In 
Asia, particularly in China, rising smoking rates cause 
the incidence of lung cancer to continue to increase. 
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Despite advances in therapy, early detection of lung car- 
cinoma is critical to facilitate successful treatment and 
increase the chances of survival. Five year survival rates 
for non-small cell lung cancer diagnosed between the 
years of 1990 and 2001 were reported to be 61% for 
stage lA, compared to 34%, 13% and 1% for stages II A, 
IIIA and IV respectively [3]. Thus, novel techniques 
which facilitate accurate diagnosis of early malignancy or 
pre-malignancy would be of great benefit to improve 
survival rates. 

Atypical adenomatous hyperplasia (AAH) and squa- 
mous cell dysplasia (SCD) have been reported as precur- 
sors of adenocarcinoma and squamous cell carcinoma of 
the lung respectively [4,5]. The progression of healthy 
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tissue to pre-neoplastic lesions and malignant disease 
has been described previously by Wistuba and Gazdar 
[6]. Briefly, with respect to squamous cell carcinoma, 
loss of heterozygosity (LOH) on chromosomes 3p21 and 
9p21 can lead to telomerase dysfunction. DNA methyla- 
tion in cell cycle regulatory genes such as pl6INK4a, 
and further LOH in tumor suppressor genes such as 
FHIT and TP53 contribute further to the dysregulation 
of cell growth in the epithelial tissue, ultimately leading 
to carcinoma in situ and metastatic disease. With re- 
spect to adenocarcinoma, mutations in the ras oncogene 
pathway (smokers) or the epidermal growth factor re- 
ceptor pathway (non-smokers) can lead to tissue growth 
dysregulation, and ultimately adenocarcinoma. 

Currently, the most widely used clinical methods for 
the detection of pre-neoplastic lung SCD lesions include 
white light bronchoscopy (WLB) and auto-fluorescence 
bronchoscopy (AFB). Chen et al. [7] reviewed 492 publi- 
cations and 14 studies contained data which was suitable 
to conduct a meta-analysis (15 data sets) in which WLB 
and AFB were compared for sensitivity and specificity 
(results were confirmed by tissue histology). The pooled 
sensitivity and specificity of AFB was 0.90 (95% CI 0.84- 
0.93) and 0.56 (95% CI 0.45-0.66) compared to 0.66 
(95% CI 0.58-0.73) and 0.69 (95% CI 0.57-0.79) for 
WLB. Thus, AFB was reported to be superior to WLB 
for the detection of lung cancer and pre-neoplastic SCD 
lesions. The technique does however have some limita- 
tions with respect to deployment in large clinical studies 
and population studies. AFB requires specialized equip- 
ment and trained operators to carry out the procedure 
and interpret the data. In addition, the procedure itself 
involves the insertion of an endoscope into the lungs of 
individuals, and hence the procedure is fairly invasive 
relative to a blood sample. 

AAH arises in the peripheral lung parenchyma where 
is mainly involved in the gas exchange at the alveolar 
level. Lesions in AAH are usually small, asymptomatic 
and radiologically invisible, helical (Spiral) CT combined 
with surgical biopsy is the most commonly reported 
method for diagnosis of AAH. At an almost indistin- 
guishable point, AAH lesions can become bronchoalveo- 
lar cell carcinoma (BAC) [8]. Clinically it has been 
suggested using size of the lesion to distinguish between 
AAH (size < 5 mm) and BAC/ Adenocarcinoma (> 5 mm) 
[8]. Despite some successes in the screening of high risk 
individuals in terms of lung cancer detection, difficulties 
still exist with the differentiation of bronchioalveolar 
carcinoma, adenocarcinoma and AAH with Helical CT 
alone and false-positive rates for malignancy diagnoses are 
20-35% [9,10]. 

Departing from these more invasive detection proce- 
dures, using blood-borne biomarkers for early cancer de- 
tection has generated great interest. Detection of serum 



antibodies against tumor-associated antigens (TAAs) 
may provide more reliable information for early cancer 
diagnosis [11,12]. The immune system is sensitive 
enough in detecting very low levels of TAAs that may 
originate in only a few neoplastic cells by generating very 
high affinity T cells and antibodies [13]. Autoantibodies 
to p53 have been reported in patients with early stage 
ovarian or colorectal cancers [13], and a panel of serum 
antibodies can detect non-small cell lung cancer 
(NSCLC) 5 years prior to autoradiograph detection [14]. 
Thirty percent of patients with ductal carcinoma in situ 
in which the proto-oncogene HER-2/neu is overex- 
pressed have serum antibodies specific to this protein 
[15]. Therefore, it is logical and practical to employ the 
body s endogenous immune system as a natural "amplifi- 
cation strategy" to detect the pre-neoplastic lesions of 
the lung. 

The current study utilizes lesion-specific autoanti- 
bodies present in blood samples to detect the presence 
of AAH and SCD lung lesions. This approach could be a 
useful non-invasive tool to assess lung cancer risk in hu- 
man populations. 

Results 

Biopanning the phage libraries 

Two T7 cDNA phage libraries were constructed using 
pooled AAH or SCD tissues. The quality of the libraries 
was titered by plaque assay and found to contain 5.5 x 10^ 
primary recombinants for the AAH library and 3.8 x 10^ 
primary recombinants for the SCD phage library. A PCR 
test was then run to determine the sizes of individual 
phage inserts using T7 primer and the results showed 
the ranges of the cDNA inserts from both the libraries 
were from 0.5 ~ 2 Kb, indicating both were good phage 
libraries. 

The two libraries were then pooled together and bio- 
panned using pooled AAH/SCD and normal serum 
samples to screen potential tumor-associated antigens 
expressed by the T7 cDNA phage libraries. Using patient 
sera as a source of primary antibody, immunodetection 
revealed that repeated cycles of panning and phage amp- 
lification yields an enriched population of immunoreac- 
tive phages (Figure 1). Comparison between duplicate 
plaque lifts of biopan 4, incubated with pooled patient 
sera or normal sera used in the biopan, showed the 
ability of this process to select immunogenic phage- 
expressed proteins (Figure 1). Immunoreactivity of mul- 
tiple clones with patient sera indicated high affinity 
binding of these phage-expressed proteins with anti- 
bodies in patient sera. In contrast, these same clones are 
not immuno-reactive in the identical membrane incu- 
bated with normal serum. The background seen on the 
membrane incubated with the normal sera was consid- 
ered nonspecific reactivity with phage proteins. 
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Figure 1 Biopanning enrichment of immunogenic tumor-associated proteins. To confirm the enrichment of the biopanning, the outputs 
of biopans 1-4 (BP1-BP4) were plated onto LB-Agar plates in limiting dilution and plaque lifts were performed. (A), the output of BPS revealed an 
increasing number of immunoreactive phage clones than (B). BP4, and illustrates the ability of the sequential biopans to enrich the concentration 
of tumor-associated proteins recognized by antibodies in patient serum. To confirm the specificity of enriched phage proteins, two nitrocellulose 
members were lifted twice on the same LB-Agar plate of BP4. One membrane was probed with pooled AAH/SCD patient sera (D) while the other 
was probed with pooled normal sera (C). Numerous immunoreactive clones show on the membrane incubated with patient sera (D) that are not 
seen in the identical membrane incubated with normal serum (C). 



High-throughput screening 

A total of 4000 phage clones were randomly selected 
from the output of the BP4 phage library and were spot- 
ted on membrane coated microarray slides. These phage 
clones were then screened with 5 individual AAH or 5 
individual SCD patient serum samples that were not 
used in the biopan to identify disease-associated im- 
munogenic phages. Linear regression of the Cy5:Cy3 sig- 
nal revealed that 238 individual phages from AAH 
samples screening and 193 individual phages from SCD 
screening had signal ratios greater than 2 standard devi- 
ations from the average and were chosen as candidates 
for a "diagnostic chip" construction. An example of lin- 
ear Cy5:Cy3 regression used for phage selection from a 
screening chip is shown in Figure 2. 

Statistical analysis in training test 

Four hundred immunoreactive phages identified by 
high-throughput screening (described above), plus 200 
"empty" T7 phages, were combined, re-amplified, and 
spotted in duplicate onto membrane coated slides 
(Schott, Germany) as diagnostic chips. Replicate chips 



were used to assess 50 AAH and 50 SCD along with 100 
control serum samples as a training test. Normalized 
Cy5:Cy3 ratios for each 400 phage clones were inde- 
pendently analyzed for statistical significance between 
patient and normal samples. A Student t test was run 
with a statistical cut-off (P<0.01) that suggested the 
relative predictive value of each candidate marker. Of 
the 400 candidate phage proteins, 47 in AAH and 39 in 
SCD test met the cut-off (p < 0.01). 

Based on the P values, the top 30 phage clones were 
collected, combined and then analyzed using a logistic 
regression algorithm for the most optimal sensitivity and 
specificity in distinguishing patients and normals. The 
most predictive accuracy was achieved in AAH test with 
sensitivity of 92.3% and specificity of 90.2% using 9 com- 
bined markers. These results were further validated 
using leave-one-out cross-validation (LOOCV) between 
patient and control samples, and ROC curves were gen- 
erated and yielded an AUC = 0.874. Therefore, combin- 
ation of 9 phage proteins showed the most accurate and 
stable classifier in prediction of AAH samples. The same 
analysis was used for the SCD test, and the result 
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Figure 2 High-throughput screening of tumor-associated phage proteins. Biopanned phage clones were spotted on microarray slides 
and tested with patient serum samples. The partial array images show reactivity patterns for (A) AAH and (C) SCD patient samples, and the 
corresponding scatter plots show possible disease-associated phage clones (X-axis is Cy3 signal, Y-axis is Cy5 signal) in (B) and (D) respectively. 
The computer-generated regression line and standard deviation lines on the scatter plots assist in identifying candidate marker phages (yellow 
dots) for diagnostic chip construction. 



showed that the most optimal sensitivity and specificity 
in distinguishing SCD from control samples was 98.3% 
and 95.6% respectively. These results were further vali- 
dated using leave-one-out cross-validation (LOOCV) be- 
tween patient and control samples, and ROC curves 
based on logistic regression were generated and yielded 
an AUC = 0.959. Therefore, a combination of 13 phage 
proteins was the most accurate and stable classifier in 
prediction of SCD samples. 



Independent validation test 

Using the same "diagnostic chips" developed in the 
training test, an independent cohort of 400 serum sam- 
ples consisting of 200 controls, 100 AAH and 100 SCD 
serum samples were assayed. The samples were tested in 
a blinded fashion, and each sample s status was calcu- 
lated separately using the AAH or SCD classifier. The 
final results were checked with the true statuses of the 
samples, and the sensitivities and specificities were then 



Table 1 Diagnostic accuracies using the two classifiers in the validation test 



Control samples 
(n = 200) 



AAH samples 
(n = 100) 



SCD samples 
(n = 100) 



AAH classifier prediction rate 

Sensitivity 

Specificity 

SCD classifier prediction rate 

Sensitivity 

Specificity 



140/200 

70% 
156/200 



82/100 



66/100 



61/100 
61% 

86/100 
86% 
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Figure 3 Diagnostic accuracy for predicting the pre-neoplastic samples in validation test. Classifiers generated in tine training tests were 
used to predict tine samples statures in the blinded validation test. Corresponding receiver operating characteristics (ROC) curves and the value 
of the area under the curve (AUC) were calculated. Panel A was generated in the validation test based on the values from 100 AAH samples and 
100 control samples. Panel B was generated based on the values from 100 SCD samples and 100 control samples. 



calculated. Overall we have correctly predicted 168 patients 
(AAH and SCD) out of 200 disease samples (the sensitivity 
is 84%), and correctly predicted 156 out of 200 control sam- 
ples (the specificity is 78%) (Table 1). Corresponding ROC 
curves and the value of the area under the curve (AUC) 
were calculated with AUC = 0.81 and AUC = 0.88 for AAH 
and SCD detection, respectively (Figure 3). As shown in 
Table 1, the two classifiers have overlaps in predicting AAH 
and SCD samples even though the sensitivities were rather 
different. In the other words, the same sample could be 
identified as either AAH or SCD by the two classifiers. 
Therefore, further confirmation may be needed in conjunc- 
tion with radio-imaging or bronchoscopy in clinic. 

Detection of early malignancy in the lung 

The World Health Organization has defined the SCD as 
a pre-neoplastic lesion for squamous cell carcinoma 
(SCC) [2], which is one step earlier than carcinoma in 
situ (CIS, stage 0 of SCC). Likewise, the AAH is the 



progenitor lesion for adenocarcinoma, which may pro- 
gress further to become a bronchoalveolar cell carcin- 
oma (BAC, stage 0 of adenocarcinoma) [6]. In many 
cases, SCD and CIS, and AAH and BAC are coexistent. 
Therefore, by separately analyzing the two pairs we may 
test the ability of the classifiers in detecting the stage 0 
of non-small cell lung cancer (NSCLC). According to 
our samples pathology reports, there were 39 AAH sam- 
ples that also had BAC, and 42 SCD samples that coex- 
isted with CIS. Using the AAH or SCD classifiers, 30/39 
(sensitivity of 76.9%) of AAH/BAC samples were identi- 
fied by AAH classifier, while 37/42 (sensitivity of 88.1%) 
of SCD/CIS samples were identified by SCD classifier 
(Table 2). The AAH classifier showed a slight decrease 
in detecting AAH/BAC samples, while SCD classifier 
showed a slight increase in detection of SCD/CIS sam- 
ples. In order to increase the detection accuracy for the 
early stage NSCLC, it may need to incorporate some 
NSCLC specific biomarkers into the present classifiers. 



Table 2 The accuracies of the classifiers in detection of stage 0 NSCLC 





AAH samples 
(n = 61) 


AAH/BAC samples 
(n = 39) 


SCD samples 
(n = 58) 


SCD/CIS samples 
(n = 42) 


AAH classifier 


52/61 


30/39 






SCD classifier 






49/58 


37/42 


Sensitivity 


85.2% 


76.9% 


84.5% 


88.1% 
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Table 3 Identities of 9 selected phage proteins for AAH 
classifier construction and 13 phage proteins for SCD 
classifier construction 



AAH 










LTBPr' 


BMIT' 


GAGE7*' 


AGBL5 


HESr 


PDE4A* 


NEFH* 


HSPA8* 


cDNA FLJ45990 




SCD 










LTBPl*'* 


BMIl*'* 


GAGE7*'* 


NRPl* 


ADPGK 


PRDX4* 


KDELRl 


SEC61P^ 


FZD6TFIP11 


LGR6* 


TXNL2* 


KLK2* 


BIRC3* 







^Proteins with known malignant association. *Proteins are also present in 
AAH classifier. 



Sequence analysis of phage-expressed proteins 

The 22 phages (9 for AAH and 13 for SCD) that were 
chosen for either AAH or SCD classifier development 
were sequenced. Although the identities of the phage- 
expressed proteins were not critical for use in a diagnos- 
tic assay, the sequences were compared with the 
GenBank database to determine possible identities. Fol- 
lowing sequencing analysis, some proteins are known to 
be associated with cancer while others have no known 
association with cancer. Detailed identities are listed in 
Table 3. Interestingly, there were 3 identical proteins 
that were selected for both the classifier constructions, 
which may explain the overlaps for prediction of the dis- 
ease status. Since these 3 markers weighted heavily in 
the logistic regression model, the decision was taken to 
keep these proteins in the array. 

Discussion 

Some lung cancers, such as squamous cell carcinoma 
and adenocarcinoma appear to develop over years or de- 
cades via a series of progressive morphological changes 
with correlating molecular alterations [6]. The 2004 
WHO classification recognises several pre-neoplastic 
precursor lesions for lung cancer [2]. These include SCD 
and AAH for squamous cell carcinoma and adenocarcin- 
oma respectively. These aberrant alterations can be de- 
tected by the immune system with corresponding 
autoantibodies production [13]. In this study, a blood- 
based autoantibody test was successfully developed using 
phage-display and protein microarray techniques. The 
current autoantibody chip achieved a sensitivity and spe- 
cificity of 82% and 70% for AAH, and 86% and 78% for 
SCD detection, respectively. The respective pooled sensi- 
tivity and specificity in a meta-analysis of 14 studies by 
Chen et al. [7] was 90% and 56% for AFB and 66% and 
69% for WLB, respectively, in detection of the pre- 
neoplastic lung lesions. Thus, the autoantibody chip 
conclusively out-performed WLB in the validation co- 
hort. With respect to AFB, the autoantibody chip was 
not quite as sensitive (but comparable) and had a higher 



specificity. The specificities of the AAH and SCD classi- 
fiers were lower than expected (based upon expectations 
of application of the technique for the detection of 
NSCLC). One possibility might be due to the presence 
of AAH or SCD lesions in the control population used 
for the validation study. If some control subjects har- 
bored these lesions, then this would reduce the specifi- 
city of the biomarker panel used in the array. In order to 
increase the specificity of the classifiers, control subjects 
could be screened for pre-neoplastic lesions to confirm 
their "lesion free" status. 

Tumor auto-antibody array proteins 

The lesion specific proteins utilized in the array are shown 
in Table 3. For AAH lesions, 9 proteins gave the best sen- 
sitivity and specificity for the array. Following sequencing 
and identification, 7 of the 9 proteins have known malig- 
nant association from literature reports. Dysregulation of 
Latent Growth Factor beta Binding Protein (LTBPl) has 
been identified in asbestos-related lung tumors and was 
associated with DNA copy number alterations and tumor- 
associated miRNAs [16]. Furthermore LTBPl may play a 
key role in the pathogenesis of mesothelioma via regula- 
tion of TGFp activation in tumor tissue [17]. In healthy in- 
dividuals, the expression of GAGE occurs in germ cells 
only [18]. However, GAGE proteins are expressed in a 
wide range of cancers, including stomach cancer, esopha- 
geal carcinoma and neuroblastoma, indicating a role in 
tumorigenesis [19]. GAGE7 is known for its anti-apoptotic 
action in human tumor cells; by conferring resistance to 
FasL-mediated apoptosis [20]. BMIl expression has been 
associated with tumor invasion and metastasis in lung and 
breast carcinoma [21,22]. In malignant lung tissue, levels 
of the cell adherence molecule E-cadherin were inversely 
correlated with BMIl [21]. Furthermore, in vitro studies 
showed that cigarette smoke condensate exposure signifi- 
cantly repressed miR-487b in normal respiratory epithelial 
cells and lung cancer cell lines. Subsequent experiments 
demonstrated that miR-487b directly targeted SUZ12, 
BMIl, WNT5A, MYC, and KRAS. Repression of miR- 
487b correlated with overexpression of these targets in 
primary lung cancers [23]. HESl is a downstream effector 
protein of the NOTCH pathway. Westhoff et al [24] re- 
ported that the NOTCH pathway has been shown to be 
altered in approximately one third of NSCLCs and that ac- 
tivation of the NOTCH pathway is associated with cell 
survival and poor clinical outcome. Interestingly, micro- 
array analysis of human bronchial epithelial cells from 
smokers and smokers with COPD demonstrated that 45 
of 55 Notch-related genes are expressed in the small air- 
way epithelium, and that down-regulation of NOTCH 
pathway gene expression was associated with smoking and 
COPD [25]. Phosphodiesterase 4A (PDE4A) has been im- 
plicated in epithelial mesenchymal transition (EMT) of 
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lung epithelial cells. Upregulation of this enzyme occurred 
in response to induction of EMT by treatment of A549 
lung epithelial cells with TGFp [26]. Furthermore, in- 
creased PDE4A expression and activity was associated 
with cell proliferation and angiogenesis [27]. NEFH [28] 
and HSPA8 [29,30] were also associated with human 
tumorigenesis, however there are no reports suggesting 
AGBL5 (an ATP/GTP binding protein) has malignant as- 
sociation outside of the current study. 

With respect to SCD, a combination of 13 proteins 
gave the best sensitivity and specificity in the array. 
LTBPl, GAGE7 and BMIl were present in the SCD 
panel, highlighting that these proteins potentially share 
similar biology with A AH lesions. Neuropilin-1 (NRPl) 
is a co-receptor for vascular endothelial growth factor 
(VEGF), and hence is associated with angiogenesis and 
tumor growth. NRPl expression was assessed in whole 
sections of 65 primary breast carcinomas, 95 primary 
colorectal adenocarcinomas, and 90 primary lung carcin- 
omas. Immunoreactivity for NRPl was seen in vessels 
from normal tissues adjacent to cancer and in 98-100% 
of carcinomas. Tumor cell expression of NRPl was also 
observed in 36% of primary lung carcinomas and 6% of 
primary breast carcinomas, but no colorectal adenocar- 
cinomas [31]. Due to the lowered overall survival rate in 
NSCLC patients who expressed high levels of NRPl in 
tumors, NRPl has been proposed as a potential drug 
target for treatment of NSCLC [32]. Thioredoxin-like 2 
(TXNL2) is a redox protein that has been reported to 
regulate cell growth and metastasis in breast cancer 
cells, and increased expression was associated with lower 
patient survival rates [33]. As tumor cells are known to 
produce higher levels of reactive oxygen species (ROS), 
proteins like TXNL2 help the cell to counteract these 
ROS, and hence continue to survive [33]. Conversely, in- 
creased mRNA expression of TXNL2 was correlated 
with prolonged patient survival following surgery of 
colorectal cancer [34]. Kallikrein-2 (KLK2) is a member 
of the Kallikrein superfamily of serine proteases, which 
are considered putative biomarkers for the screening, 
diagnosis, prognosis, and monitoring of various cancers 
including those of the prostate, ovaries, breast, testicles, 
and lung [35]. The most well-known kallikrein is 
prostate-specific antigen (KLK3), which is used clinically 
to diagnose human prostate cancer. KLKs are expressed 
in many human tissues and are regulated by steroid hor- 
mones [36]. Peroxiredoxin 4 (PRDX4) is a redox protein 
located in the endoplasmic reticulum and is a proposed 
scavenger enzyme for H2O2 [37]. PRDX4 has been pro- 
posed as a biomarker of oxidative stress and has been 
associated with high risk of cardiovascular disease 
(CVD) and CVD mortality [38]. Studies in endothelial 
cells of patients with NSCLC showed increased expres- 
sion of PRDX4 in tumors, which was not the case in 



adjacent normal tissue. Furthermore, elevated expression 
of PRDX4 was present in the epithelial cells of the tu- 
mors [39]. Baculoviral lAP repeat containing 3 (BIRC3) 
is a member of the lAP family of proteins which inhibit 
apoptosis by binding to tumor necrosis factor receptor- 
associated factors TRAFl and TRAF2 [40]. Due to the 
ability of these proteins to inhibit apoptosis via the Nfl<:B 
pathway, lAPs have been proposed as potential target 
molecules for anti-cancer therapeutics [41,42]. Nuclear 
over-expression of I API was strongly correlated with 
tumor stage/grade and poorer prognosis in bladder 
cancer patients [43]. Elevated expression of specific 
members of the lAP family may be tumor specific, as 
studies in breast cancer cells showed elevated expression 
of cIAP2, where other members of the lAP family 
(cIAPl and XIAP) were not upregulated [44]. Endoplas- 
mic reticulum protein retention receptor 1 (KDELRl) 
has been implicated in the initiation of pro-apoptotic 
endoplasmic reticulum stress responses [45]. Data in re- 
lation to its involvement with tumorigenesis is currently 
very thin. Studies by Yi et al [46] have reported that 
NPAS2 (involved with regulation of circadian rhythm) 
has associations with risk of breast cancer, prostate 
cancer and non-Hodgkins lymphoma. In vitro studies in 
breast cancer cell line MCF-7 by the same group, 
showed that KDLERl is a transcriptional target of 
NPAS2 [46], however no studies to date have demon- 
strated a direct malignant association. The Sec61 complex 
is the central component of the protein transloca- 
tion apparatus of the endoplasmic reticulum (ER) mem- 
brane, and is involved in the translocation of proteins 
into the ER membrane, notably the EGF receptor [47]. 
SEC61|3 expression was elevated 1.9 fold in human 
colorectal tumors and was proposed as a biomarker for 
the early detection of colorectal cancer. Furthermore, 
serum auto-antibodies to SEC61p were proposed as 
a surrogate biomarker for tissue SEC61p achieving a 
sensitivity and specificity of 79% and 75% respectively 
[48]. Leucine-rich repeat containing G protein-coupled 
receptor 6 (LGR6) is a member of the rhodopsin-like 
seven transmembrane domain receptor superfamily and 
mutations in this gene have been reported in colon can- 
cer tissue [49] and gastric carcinoma [50]. Mechanistic 
studies have demonstrated that LGR6 is a high affinity 
receptor for R-spondins 1-3 and potentially functions 
as a tumor suppressor [49]. Finally, with respect to 
FZD6TFIP11, there are no published data to date on 
this protein. 

Considerations for use of the array 

The use of tobacco, especially cigarette smoking, has 
long been considered as the leading cause of small cell 
and non-small cell lung cancer, contributes to 80% and 
90% of lung cancer deaths in women and men. 
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respectively [51]. However, not all smokers will develop 
lung cancer; only about 15% of heavy smokers are diag- 
nosed with lung cancer in their lifetimes [52]. Therefore, 
it is critical to identify these individuals who have poten- 
tial to develop lung cancer from high-risk smoking pop- 
ulations. The autoantibody chip developed in this study 
will be a good approach in identif)^ing the pre-neoplastic 
lung lesions from the high-risk populations. By combin- 
ing with the current imaging and endoscopy techniques, 
affected individuals could be identified earlier and hence 
receive appropriate medical guidance and monitoring to 
reduce their risk of developing lung cancer. 

Three key limitations of the auto-antibody array need 
to be considered prior to its use in population studies. 
The first of these is that the classifiers for AAH and 
SCD do overlap and furthermore, share similarities with 
diseased tissue (Tables 1, 2 & 3). The overlap is to be ex- 
pected however, as the dysregulation observed at the 
pre-neoplastic level is likely to share common irregular- 
ities and also to be causative of the malignant disease, 
and hence present in malignant tissue [6]. In order to 
measure the progression/regression of pre-neoplastic 
lung lesions in human population studies with this tech- 
nology, it would be necessary to confirm that study par- 
ticipants do not harbor malignant disease (all cancers) at 
screening. Once confirmed to be cancer-free, any subse- 
quent incidence of lung cancer is likely to have come 
from pre-neoplastic lesions either present at screening, 
or which have developed over the course of the study. 
To exclusively monitor development of pre-neoplastic 
lesions, study subjects should also be screened for these 
lesions using AFB and examined by a thoracic surgeon 
upon study commencement. 

Secondly, it was assumed that as the classifiers were 
developed from pre-neoplastic lung tissue, they are 
specific for lung tissue lesions. As discussed above, it is 
evident that the array proteins share common dysfunc- 
tional pathways with other malignancies and hence false 
positives could be detected following application of 
serum from subjects with other types of cancer. There- 
fore, further studies are recommended to investigate the 
specificity of the classifiers for other types of cancer. 

Thirdly, the auto-antibody array detects the presence 
of pre-neoplastic/malignant lesions alone. So far, no 
studies have attempted to correlate the strength of an 
autoantibody signal with the number or severity of le- 
sions. Further studies are required to investigate the cor- 
relation of signal strength with lesion number and 
severity and conclusively determine if lesions are pre- 
neoplastic/malignant or not. 

Given the advantages the auto-antibody array has in 
terms of the non-invasiveness of the sample collection 
(human blood), the higher throughput (compared to 
AFB, WLB and spiral CT), little need for experienced 



clinicians to interpret the data coupled with compar- 
able sensitivity to AFB and higher specificity than 
AFB, we propose that this autoantibody array could 
be a useful tool in population screening programs to 
identify high risk individuals and studies undertaken 
to compare the harm reduction potential of com- 
bustible and non-combustible tobacco products, and 
nicotine delivery devices. Furthermore, a combination 
approach utilizing this auto-antibody array in con- 
junction with imaging technology could help to over- 
come potential cross-reactivity with other types of 
tumor. 



Conclusions 

Our results showed promising accuracies for detection 
of pre-neoplastic lung lesions. This sensitive serologic 
test could be used as a tool for early diagnosis or screen- 
ing of lung cancer, which could also be used in concert 
with radiographic imaging and other diagnostic strat- 
egies to facilitate earlier clinical intervention and provide 
useful information for assessing lung cancer risk in hu- 
man populations. 

Materials and methods 

Study population 

Following informed consent, 1500 high-risk patients 
(age > 55, smoking > 20 pack-year) were registered at 
Hebei University Affiliated Hospital between 2002 to 
2007. Serum samples were collected from all patients 
before any procedures were performed. All the pa- 
tients were then gone through autofluorescence bron- 
choscopy (AFB) and spiral computed tomography 
(SCT) scanning. Biopsies were taken from all abnor- 
mal and suspicious areas during the bronchoscopy. 
Suspicious nodules found by SCT were surgically 
removed. All tissue specimens either from biopsies or 
surgeries were evaluated by pathologists. Confirmed 
SCD or AAH cases were selected for our study. Three 
pieces of AAH and three pieces of SCD tissues were 
collected from six patients and submerged into RNA- 
later buffer (Qiagen) immediately for cDNA library 
construction. In addition, 600 serum samples from 150 
patients with AAH, 150 patients with SCD and 300 
risk-matched control individuals were obtained 
through protocols approved by the Institutional Re- 
view Board. Standard venipuncture technique was used 
to draw peripheral blood into 10-ml glass red top 
tubes, and samples were left to stand for 30 minutes. 
Sera were then separated by centrifugation, and an aU- 
quot was taken and frozen at -80°C. Detailed demo- 
graphic and histopathologic data for serum samples is 
listed in Table 4. 
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Table 4 Demographic and histopathologic data for serum samples 




Control training 


Disease training 


Control validation 


Disease validation 




set No (%) 


set No (%) 


set N (%) 


set No (%) 


Sample N = 


100 


100 


200 


200 


Age range (years) 


55-80 


57-86 


55-80 


56-87 


Gender 










Male 


78 (78%) 


79 (79%) 


1 79 (89.5% 


180 (90%) 


Female 


12 (12%) 


11 (11%) 


21 (10.5%) 


20 (10%) 


Smoking (> 20 py) 










Active 


33 (33%) 


36 (36%) 


69 (34.5%) 


73 (36.5%) 


Former 


64 (64%) 


62 (62%) 


127 (63.5%) 


124 (62%) 


Never 


3 (3%) 


2 (2%) 


4 (2%) 


3 (1.5%) 


Histopathology 










AAH 


0 


50 


0 


100 


BAC/AAH 


0 


0 


0 


39/100 (39%) 


SCD 


0 


50 


0 


100 


CIS/SCD 


0 


0 


0 


42/100 (42%) 



RNA extraction and T7 phage display library construction 

Two T7-phage cDNA libraries were constructed using 
tissues from AAH or SCD patients. Total RNA was 
extracted and purified by using the RNeasy Mini Kit 
(Qiagen). PolyA mRNA was isolated from total RNA by 
Oligotex Direct mRNA Mini Kit (Qiagen) and mRNA 
concentration was measured by mRNA Concentration 
Module (Invitrogen). Equal amounts of mRNA from 
either 3 AAH or 3 SCD tissue samples were pooled and 
reverse transcribed to cDNA with random primers, 
ligated with EcoRI/Hindlll linkers, and cloned into T7 
select 10-3b vector arms of phage using the T7Select 
system (Novagen) according to the manufacturers in- 
structions. After in vitro packaging, a phage library was 
obtained, and the titer of the phage library was tested 
using a plaque assay to determine the number of recom- 
binants (inserts) generated within the library. 

Biopanning enrichment of disease associated phage 
clones 

In order to enhance the selection of tumor-associated 
proteins, the AAH and SCD pooled phage library was 
biopanned using pooled AAH/SCD and normal serum 
samples. To remove non-tumor proteins, the library was 
affinity selected against antibodies in 6 pooled normal 
serum (250 [A pooled normal serum 1:20 dilution; 4°C 
o/n) bound to G agarose beads. Unbound phages were 
separated from bound phages by centrifugation. Phages 
expressing immunogenic tumor-associated proteins were 
selected with antibodies in serum pooled from 3 AAH 
and 3 SCD patients similarly bound to G agarose beads, 
and separated from unbound phages by centrifugation. 
The bound/reactive phages were eluted with 1% SDS 



and centrifugation. The eluted phage library thus 
expressed tumor proteins that had higher reactivity with 
antibodies found in AAH/SCD serum than those found 
in normal serum. This process was repeated four times 
with amplification in bacteria (E. coli BLT5615) between 
each biopan. The phages from biopan 4 were used to 
infect bacteria and grown on agar plates in limiting 
dilutions to determine titer and allow isolation of indi- 
vidual phage clones. Individual "plaques" on the agar 
plate, representing growth from a single phage clone, 
were usually seen at dilutions of 10^ to 10^. Phage col- 
onies plated to limiting dilution were transferred to 
nitrocellulose membranes by plaque lift to be evaluated 
for reactivity with serum antibodies using standard 
immunodetection. 

To confirm the enrichment of the biopanning, the out- 
puts of biopans 1-4 (BP1-BP4) were plated onto LB- 
Agar plates (in limiting dilution) and plaque lifts were 
performed. The plaque lift nitrocellulose membranes 
were then incubated with pooled AAH/SCD patient 
serum (1:2000) followed by anti-human HRP-conjugated 
secondary Ab (1:1000) and detected with ECL chemilu- 
minescence (Amersham). Immunodetection of each of 
the four biopans revealed an increasing number of react- 
ive phage clones from BPl to 4, and illustrates the ability 
of the sequential biopans to enrich the concentration of 
tumor-associated proteins recognized by antibodies in 
patient serum. To confirm the specific selection of 
disease-associated proteins, biopan 4 was plated onto 
LB-Agar plates and plaque lifts were performed twice 
with nylon disk membranes. One of the plaque lift mem- 
branes was then incubated with pooled AHH/SCD pa- 
tient serum (1:2000) and the other was incubated with 
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pooled normal sera, followed by anti-human HRP- 
conjugated secondary Ab (1:1000) and detected with 
ECL chemiluminescence (Amersham). 

Microarray high-throughput screening 

After four cycles of biopanning, 4,000 individual phage 
colonies were picked from the biopanned library and in- 
oculated into 96-well plates containing 150 [A E. coli 
BLT5615 cells in each well at 37°C for 3 hrs. After the 
incubation, 30 [A supernatant from each well was trans- 
ferred into 384-well plates. "Empty T7" phage (T7 phage 
without inserts) clones were also incorporated into these 
384-well plates as an internal control OmniGrid 100 
(Genomic Solution) microarray spotting machine was 
used to spot 5 nl phage lysates from each well of the 
384-well plates onto nitrocellulose-coated membrane 
slides (Schott, Germany). In order to track the sample 
locations of each spot, a GenePix Array List (GAL) file 
was generated during spotting. 

Five individual AAH or 5 SCD patient serum samples 
that were not used in the biopanning were used to iden- 
tify possible tumor-associated proteins from the screen- 
ing slides. Rabbit anti-T7 primary antibody was used to 
detect T7 capsid proteins as internal control for total 
proteins. Both preabsorbed plasma (serum:bacterial lys- 
ate, 1:30) samples and anti-T7 antibodies were diluted 
1:3,000 with 4% dry-milk in TEST (1 xTBS plus 0.1% 
Tween 20) and incubated with the screening slides for 
1 hr at room temperature. Slides were washed and then 
probed with Cy5 -labeled anti-human and Cy3-labeled 
anti-rabbit secondary antibodies; each antibody diluted 
1:4,000 in Ix TEST together for 1 h at room temperature. 
Slides were washed again and then scanned with a Gene- 
Pix 400B scanner. 

Images were analyzed with GenePix 5.0 software 
(Axon Instruments/Molecular Devices, Union City, CA). 
Linear regression of the Cy5:Cy3 signals were generated 
using the same software. Phage clones that had Cy5:Cy3 
signal ratios greater than 2 standard deviations from a 
linear regression were selected as candidates for use in 
the protein array. 

Testing serum samples using diagnostic chips 

Four hundred immune-reactive phages identified by 
high-throughput screening (described above), plus 200 
"empty" T7 phages, were combined, reamplified, and 
spotted in duplicate onto FAST slides as diagnostic 
chips. Replicate chips were used to assess 50 AAH or 50 
SCD along with 100 control serum samples as a training 
group, according to the protocol described above for 
screening. The classifiers developed in the training group 
were then further validated with an independent blinded 
sample cohort as a validation group. A cohort of 400 
serum samples consisting of 200 controls, 100 AAH and 



100 SCD serum samples were tested using the same 
"diagnostic chips", and each sample s statues was calcu- 
lated separately using the AAH or SCD classifiers. The 
final results were checked with the true statuses of the 
samples, and the sensitivities and specificities were then 
calculated. 

Data processing and statistical model construction 

The median Cy5 signal was normalized to the median 
Cy3 signal (Cy5:Cy3 signal ratio) as the measurement 
of human antibody against a unique phage-expressed 
protein. To compensate for chip-to-chip variability, mea- 
surements were further normalized by subtracting back- 
ground reactivity of plasma against empty T7 phage 
proteins and dividing by the median of the T7 signal 
[(CY5:Cy3 of phage - Cy5:Cy3 of T7)/Cy5:Cy3 of T7]. 
Normalized Cy5:Cy3 ratio for each phage clone was in- 
dependently analyzed for statistical significance between 
the patient (50 AAH or 50 SCD) samples and 100 con- 
trol serum samples by ^-test, using JMP statistical soft- 
ware (SAS, Inc., Cary, NC). Candidate phage markers 
were chosen if P < 0.01, and subjected in different com- 
binations for classifier development. In order to develop 
statistical models/classifiers that generate the maximal 
accuracy for prediction, logistic regression (LR) was used 
for developing a model for disease prediction. 

The calculation was carried out using SAS statistical 
software. Weights for the relative importance of each 
protein were developed and incorporated into the 
model. ROC curves based on logistic regression or an 
alternative method were used to determine an optimal 
(high sensitivity and specificity) decision rule for disease 
diagnosis. The classifiers were further examined by 
leave-one-out cross-validation. 

Sequence identification 

Identities of the phage cDNA inserts used in the classifiers 
were identified through PCR-amplification using commer- 
cially available T7 phage vector primer (Novagen, USA). 
The sequences are: T7 up 5' - GGAGCTGTCGTATTCC 
AGTC - 3' and T7 down 5' - AACCCCTCAAGACCC 
GTTTA - 3'. The PCR products were purified, and se- 
quenced. The sequence results were identified in the Gen- 
Bank database using the BLAST search program. 
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